Search CORE

850 research outputs found

Modular Networks: Learning to Decompose Neural Computation

Author: Barber David
Kirsch Louis
Kunze Julius
Publication venue
Publication date: 01/01/2018
Field of study

Scaling model capacity has been vital in the success of deep learning. For a typical network, necessary compute resources and training time grow dramatically with model size. Conditional computation is a promising way to increase the number of parameters with a relatively small increase in resources. We propose a training algorithm that flexibly chooses neural modules based on the data to be processed. Both the decomposition and modules are learned end-to-end. In contrast to existing approaches, training does not rely on regularization to enforce diversity in module use. We apply modular networks both to image recognition and language modeling tasks, where we achieve superior performance compared to several baselines. Introspection reveals that modules specialize in interpretable contexts.Comment: NIPS 201

arXiv.org e-Print Archive

UCL Discovery

Transfer Learning for Speech Recognition on a Budget

Author: Johannsmeier Jens
Kirsch Louis
Krug Andreas
Kunze Julius
Kurenkov Ilia
Stober Sebastian
Publication venue
Publication date: 01/01/2017
Field of study

End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to the German language. We show that this technique allows faster training on consumer-grade resources while requiring less training data in order to achieve the same accuracy, thereby lowering the cost of training ASR models in other languages. Model introspection revealed that small adaptations to the network's weights were sufficient for good performance, especially for inner layers.Comment: Accepted for 2nd ACL Workshop on Representation Learning for NL

arXiv.org e-Print Archive

Crossref

The Benefits of Model-Based Generalization in Reinforcement Learning

Author: Kirsch Louis
Ramesh Aditya
Schmidhuber Jürgen
Young Kenny
Publication venue
Publication date: 03/11/2022
Field of study

Model-Based Reinforcement Learning (RL) is widely believed to have the potential to improve sample efficiency by allowing an agent to synthesize large amounts of imagined experience. Experience Replay (ER) can be considered a simple kind of model, which has proved extremely effective at improving the stability and efficiency of deep RL. In principle, a learned parametric model could improve on ER by generalizing from real experience to augment the dataset with additional plausible experience. However, owing to the many design choices involved in empirically successful algorithms, it can be very hard to establish where the benefits are actually coming from. Here, we provide theoretical and empirical insight into when, and how, we can expect data generated by a learned model to be useful. First, we provide a general theorem motivating how learning a model as an intermediate step can narrow down the set of possible value functions more than learning a value function directly from data using the Bellman equation. Second, we provide an illustrative example showing empirically how a similar effect occurs in a more concrete setting with neural network function approximation. Finally, we provide extensive experiments showing the benefit of model-based learning for online RL in environments with combinatorial complexity, but factored structure that allows a learned model to generalize. In these experiments, we take care to control for other factors in order to isolate, insofar as possible, the benefit of using experience generated by a learned model relative to ER alone

arXiv.org e-Print Archive

Présentation du premier numéro de Travail et apprentissages. Revue de didactique professionnelle

Author: Kirsch Jean-Louis
Publication venue: 'OpenEdition'
Publication date: 16/12/2009
Field of study

Décidément, il y a du nouveau dans l’analyse du travail ! Sous la plume de Pierre Roche, on a pu lire, dans cette même rubrique du numéro 99 de Formation Emploi, la présentation de l’ouvrage de Dominique Lhuillier « Cliniques du Travail » (Roche, 2007). Aujourd’hui, il s’agit d’une nouvelle revue – « Travail et Apprentissages – Revue de Didactique Professionnelle » – dont le premier numéro est paru en février 2008. Publiée avec le soutien de l’association « Recherches et pratiques en didactiq..

OpenEdition

Causal diffusion and its backwards diffusion problem

Author: Ahmadizadeh
Alt
Ammari
Ammari
Ammari
Ammari
Bronstein
Dorn
Elayyan
Engl
Gasquet
Griebel
Gryazin
Guichard
Hetrick
Heuser
Hörmander
Isakov
Isakov
Isakov
Kirsch
Kowar
Louis
Markel
Natterer
Richard Kowar
Sapiro
Scherzer
Storch
Wei
Weickert
Publication venue: 'Elsevier BV'
Publication date: 01/03/2012
Field of study

This article starts over the backwards diffusion problem by replacing the \emph{noncausal} diffusion equation, the direct problem, by the \emph{causal} diffusion model developed in \cite{Kow11} for the case of constant diffusion speed. For this purpose we derive an analytic representation of the Green function of causal diffusion in the wave vector-time space for arbitrary (wave vector) dimension

N

. We prove that the respective backwards diffusion problem is ill-posed, but not exponentially ill-posed, if the data acquisition time is larger than a characteristic time period

\tau

(

2\,\tau

) for space dimension

N\geq 3

(N=2). In contrast to the noncausal case, the inverse problem is well-posed for N=1. Moreover, we perform a theoretical and numerical comparison between causal and noncausal diffusion in the \emph{space-time domain} and the \emph{wave vector-time domain}. The paper is concluded with numerical simulations of the backwards diffusion problem via the Landweber method.Comment: In the replacement I have rewritten the abstract and the introduction. Moreover, I have added Remark 1 and simplified a little bit the proof of Theorem 4. The reference 25 is updated, since the paper is now publishe

arXiv.org e-Print Archive

Crossref

Association between lifestyle and musculoskeletal pain:cross-sectional study among 10,000 adults from the general working population

Author: Andersen Lars Louis
Bay Hans
Bláfoss Rúni
Kirsch Micheletti Jéssica
Pastre Carlos Marcelo
Sundstrup Emil
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/12/2019
Field of study

VBN

The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

Author: Ashley Dylan
Faccio Francesco
Hofmann Thomas
Kirsch Louis
Schlag Imanol
Schmidhuber Jürgen
Serikov Oleg
Stanić Aleksandar
Publication venue
Publication date: 20/09/2023
Field of study

The Languini Kitchen serves as both a research collective and codebase designed to empower researchers with limited computational resources to contribute meaningfully to the field of language modelling. We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. The number of tokens on which a model is trained is defined by the model's throughput and the chosen compute class. Notably, this approach avoids constraints on critical hyperparameters which affect total parameters or floating-point operations. For evaluation, we pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. On it, we compare methods based on their empirical scaling trends which are estimated through experiments at various levels of compute. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput. While the GPT baseline achieves better perplexity throughout all our levels of compute, our LSTM baseline exhibits a predictable and more favourable scaling law. This is due to the improved throughput and the need for fewer training tokens to achieve the same decrease in test perplexity. Extrapolating the scaling laws leads of both models results in an intersection at roughly 50,000 accelerator hours. We hope this work can serve as the foundation for meaningful and reproducible language modelling research

arXiv.org e-Print Archive

A test of positive suggestions about side effects as a way of enhancing the analgesic response to NSAIDs

Author: Berna Chantal
Décosterd Isabelle
Fernandez Aurore
Kaptchuk Ted J.
Kirsch Irving
Noël Louis
Rodondi Pierre Yves
Suter Marc R.
Publication venue
Publication date: 01/01/2019
Field of study

Side effects are frequent in pharmacological pain management, potentially preceding analgesia and limiting drug tolerability. Discussing side effects is part of informed consent, yet can favor nocebo effects. This study aimed to test whether a positive suggestion regarding side effects, which could act as reminders of the medication having been absorbed, might favor analgesia in a clinical interaction model. Sixty-six healthy males participated in a study “to validate pupillometry as an objective measure of analgesia”. Participants were unknowingly randomized double-blind to positive vs control information about side effects embedded in a video regarding the study drugs. Sequences of moderately painful heat stimuli applied before and after treatment with diclofenac and atropine served to evaluate analgesia. Atropine was deceptively presented as a co-analgesic, but used to induce side effects. Adverse events (AE) were collected with the General Assessment of Side Effects (GASE) questionnaire prior to the second induced pain sequence. Debriefing fully informed participants regarding the purpose of the study and showed them the two videos.The combination of medication led to significant analgesia, without a between-group difference. Positive information about side effects increased the attribution of AE to the treatment compared to the control information. The total GASE score was correlated with analgesia, i.e., the more AEs reported, the stronger the analgesia. Interestingly, there was a significant between-groups difference on this correlation: the GASE score and analgesia correlated only in the positive information group. This provides evidence for a selective link between AEs and pain relief in the group who received the suggestion that AEs could be taken as a sign “that help was on the way”. During debriefing, 65% of participants said they would prefer to receive the positive message in a clinical context. Although the present results cannot be translated immediately to clinical pain conditions, they do indicate the importance of testing this type of modulation in a clinical context

Serveur académique lausannois

Directory of Open Access Journals

RERO DOC Digital Library

FigShare